-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable minion's IPC channel to aggregate results from spawned jobber processes #61468
Enable minion's IPC channel to aggregate results from spawned jobber processes #61468
Conversation
…processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport. This is a necessary optimization for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports. Testing Done: * this tests provide coverage for this use case: .../salt/tests/pytests/integration/minion.*
re-run full all |
You might consider also updating https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow. |
Thanks. I will update https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow shortly. |
re-run full all |
…r events intended for the master's ReqServer. Tag these specific events with a specific tag "__master_req_channel_payload" Update the architecture flow accordingly: https://github.com/saltstack/salt/blob/master/doc/topics/development/architecture.rst#minion-job-flow Testing Done: - updated unit tests, - tested manually with zeromq
Thanks. I can reproduce the test failure locally. Looks like the fix may need be in the TCP transport itself. I will try to update the PR with the fix. |
@devkits Can you please add a changelog for this PR? Thanks! |
7215701
Congratulations on your first PR being merged! 🎉 |
Since [1] minions now return all returns to all masters, instead of just the master that spawned the job. The upstream change in behaviour overloads our global masters making them unusable, so this aims to revert to the previous behaviour whilst maintaining the single-channel return improvements also introduced in [1]. [1]: saltstack#61468 Upstream-bug: saltstack#62834 Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
…sc#1213257) This reverts commits: saltstack/salt@a99ffb5 saltstack/salt@80ae518 saltstack/salt@3c7e1ec saltstack/salt@171926c From this PR: saltstack/salt#61468 See: saltstack/salt#62959 (comment)
…sc#1213257) * Revert usage of long running REQ channel (bsc#1213960, bsc#1213630, bsc#1213257) This reverts commits: saltstack/salt@a99ffb5 saltstack/salt@80ae518 saltstack/salt@3c7e1ec saltstack/salt@171926c From this PR: saltstack/salt#61468 See: saltstack/salt#62959 (comment) * Revert "Fix linter" This reverts commit d09d2d3. * Revert "add a regression test" This reverts commit b2c32be. * Fix failing tests after reverting commits
What does this PR do?
Enable minion's IPC channel to aggregate results from spawned jobber processes. Use a long-running request channel in the minion parent process to communicate job results back to the master via broker-based or broker-less transport.
This is a necessary optimization at scale for transports that prefer a sustained long-running connection because connection create/dispose operations are expensive. The working assumption is that this change benefits all supported transports.
Testing Done:
.../salt/tests/pytests/integration/minion.*
Previous Behavior
Minion would spawn a process that would run a job and create/dispose a new transport connection just to communicate one job result. This does not scale well for transports that prefer persistent connections because connection operations are expensive.
New Behavior
Minion would spawn a process that would run a job and communicate result via an IPC channel to the parent minion process. Parent minion process would use a long-running connection to communicate result back to master. This approach scales better because it reduces connection churn when thousands of minions are responding to the master.
Fixes: #61274